A 70M parameter LLM I pre-trained from scratch on my laptop following a Llama-3 architecture and using a Llama-2 tokenizer. It was trained on 705.8mb of uncompressed text data from Discord and other sources. The model generalized and is able to chat like a random person on Discord. But it is very very dumb.